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ABSTRACT 

The Portland Board of Education had requested that 
the Oregon Central Evaluation Department provide student achievement 
data so as to allow comparisons iMth other school districts by 
reporting national grade level equivalent (GLE) scores on 
standardized tfests of reading and mathematics for grades 4 and 8. For 
years, the position of most research and evaluation personnel in 
Portland's district has been that national GLEs are an inadequate and 
misleading type of score for representing stqdent achievement in the 
district. This position has been based on information about the 
discrepant meaning of GLEs from test to test and also upon certain 
technical characteristics of these scores that\might make them 
unsuitable for research and evaluation purposes. This paper discusses 
the advantages, disadvantages, differences in variations, 
interpretations, interpolations and alternatives to reporting GLEs 
and other standardized scores. (Author/DEP) / 
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ITie Appropriate and Inappropriate Uses 
Of Grade Level Equivalents In School Evaluation 
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Like the discussion of the other topics in this Division H symposium 
the present review of the question of whether to report test data in terms 
of Grade Level Equivalents ai ose out of a situation in a school district 
which may find a parallel in the experience of some other members of AERA* 



It is hoped tha^ this discission will help toward the creation and sharing 



of workable solutions to common research and evaluation problems, including 



their real and important political and hum^in dimensions, 

Tlie Problem 

As late .as 1973 the Portland, Oregon Central Evaluation Department 
found itself responding to the compellingly expressed need of its Board of 
Education for ."data on rtudent achievement allowing comparison with other 
school districts" by reporting national Grade Level Equivalent scores on 
standardized tests of Reading and Mathematics at grades 4 and 8 (see Figure 1). 
TMs occurred in spite of the fact that Portland had been one of the first 
cities in the country to move to locally developed and normed tests, having 
completed development of such a program well before 1970, It also transpired 
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in the face of continuing efforts to inform board members arid other district 
leaders of the limitations of national Standardized Tests in general and 
Grade Level Equivalents as a means of reporting their results in particular. 

Drs. Mazer and Hansen have already reviewed some of the reasons urged 
against national Standardized Tests which led the district to return in 1974 
to reporting standard scores on locally developed and normed tests for ^ur 
district wide testing program (see Figure 2). And you are all familiar with 
the limitations and merits of Grade Level Equiva\ents since they have been 
well and frequently documented (Flanagan, 1951; Coleman, 1970; Thorndike, 
1971; Davis; 1974). Nevertheless, having this information recounted again 
in terms which helped one district toward a better testing system may help 
others in similar situations. And a report of some efforts to discover and 
develop even more responsive measuring and reporting systems than those 
currently available may be of even greater intere^. 

Method of p eriv ntion of the Grn de Equivalent Scale 

The process of deriving a Grade Equivalent scale is commonly begun with 
a tost, usually an achievement test, being given to large and hopefully 
representative groups of students in the consecutive grades for «hich it is 
desired to report the Grade Equivalents. The test is administered at the 
same time of year for all pupils, usually at the end of the year. The aver- 
age raw score of each grade level is then found and plotted against grade 
level. Next, a curve is fitted and smoothed to connect the points thus 
plotted. Often the curve is extrapolated to cover upper and lower grades, 
imally, tables of the raw scores paired with each tenth of a Grade are 
prepared. 

Pi sn d V nntngcs 

Mnny of thp' possible and actual limitations of Gra-lc Level Equivalents 
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a**i8e /f rom the way in vhich these scales are commonly established, and still 
othet limitations arise from the nature of the scale itself. The limita- 
tioiis result in such disadvantages as the five listed below. 

/ A 

' 1/ Interpretation - The naive interpretation is often wrong, e.g. a sixth^ 
grader scoring at the eighth grade level is probably not "performing'' at 

' the level of an average eighth grader in the sense that he or she knows 
about the same things about as well as the more advanced student. He or 
she is, however, probably performing exceptionally well on the items deal- 
ing with sixth grade matter. 

2, Uniform Growth and Emphasis - Within a subject the units of measurement 

do not represent reasonably equivalent amounts of subject matter being 

measured, e,g."a gain from a grade-equivalent score of 6.9 t^ 7.9 (on a 

test of Arithmetic Computation) indicates that a student has improved about 

1 

thirteen times as much as a grade-equivalent score of 1,9 to 2.9'' Moreover, 
to /he e::tciit that the assumption that the same curriculum and consistexit 
emptb^4H5-H[rs shared within a subject by the norm and test groups is violated, 
any comparisons between these two and among test groups using Grade Equiva- 
lents is invalidated. 

3^ Di£for(?nce^"ln Vnrjatlon - From subject to subject the same Grade Level 
Equivalents mean different things, e.g., a fifth grade student receiving a 
Grade Equivalent score of 7.0 on a test of arithmetic may stand at the 
ninety-fifth percentile relative to his grade group. Whereas, the same ^ 
result on another test, say of reading, where correlation between grade 
and test score is lower, may indicate only a standing at the sixtictii 
percentile. 
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1. Davis, Frederick B. Educntion.il Measurem e nts and Their Intorprctalion . 
HelmonL, California: WdclswDrLh PulULshJ iij; Company, 196^t , p. AO. 
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4, Interpolation - It Is common to Interpolate between testing groups by 
fitting and smoothing a curve between the plotted points. This process 

involves the application of questionable assumptions about the natfre and 

i 

course of learni-ng. Grade norms are most appropriate only for elementary 
school subjects which are studied continuously at fairly commonly increas- 
ing levels of difficulty over the grades. Grade Equivalents should never 
extend beyond the ninth grade since there is little continuoua and system- 
atic instruction beyond that grade for the subjects taught in elementary 
school, 

5; Extrnpolation - It is also common to extrapolate from the curve to'low 
and high grades, to the extent that this is the case, reported scores are ^ 
almost worthleie-^due to unreliability and invalidated judgment. 

Such disadvantages as those listed below havrled the authors and 
editors of Standar^ls for Educational and Psycho log: cal Tests to make soma 
strong warnings about the use of Grade Equivalents. These include: 

D5.23 "Interpretive scores which lend themselves to gross misinter- 
pretations such as mental age or grade equivalent scores, should be aban- 
doned or their use discouraged." (italics added) 

AND J5.2 "Test users should avoid the use of terms such as I.Q., I.Q. 
equivalent, or p rade equivalen t where other terms provide more meaningful 
interpretations of a score." (italics added) 

An analysis by Dr. George Ingebo (Evaluation Specialist in Portland's 
Area III) of the recent report of the ANCHOR Test Study provides a means 
of verification of the impact of the technical and practical limitations of 
Grade Level Equivalent scores. Figure 3 is a table showing the discrepancies 
between the Grade Level Equivalents reported among four well known and 
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widely used standardized tests. In the table is reported the discrepancy 
between the Grade Equivalents for the Fifth Grade California Test of Basic 
Skills (CTBS) andthe Iowa Test of Basic Skills (ITBS) , The Metropolitan 
Achievement Test (MAT) and the Standford Achievement Test (SAT). It seems 
apparent from this data that forekiiowlndge of even very roughly where a 
majority of students might score (low, medium, high) would allow an unscru- 
pulous test director to improve his or her district's apparent performance 
by as much as two Grade Equivalents. 

Advanta{^,es 

One positive thing 'is occasionally said about Grade Level Equivalents. 
Even though test users are ^constantly misinterpreting Grade Equivalents in 
the vays ve have been describing, nevertheless they like these scores 
because of their apparent familiarity, simplicity and directness of meaning. 
Grade Level Equivalents seem, in short, more easily understood. 

m\ev\ we consider that this apparent understandability is in fact 
largely merely apparent and that the choice of the Grade Equivalent scale 
is often a choice to "misundersta-nd, in comfort** rather than to make the 
additional effort . necessary to understand correctly then even this sole 
positive thing to be said about Grade Level Equivalents doesn't seem very 
compellingly in favor. of their use. 

Alternatives 

Traditional alternatives to Grade Level Equivalents have included 
percentile rank within grade score, Z-scores, K-scores, stanines, etc. All 
of these scores with the aid of good reporting techniques are capable of 
being rendered as apparently understandable as the Grade Level KquivalcnL 
without the dangers of misinterpretation inherent in that form of conversion 
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(refer again to Figures 1 and 2 for a comparison of the understandability of 
Gradellfevel Equivalents and standard scores when embedded in a well designed 
graphic reporting format). 

In Portland exploration of another alternative is underway, an approach 
to testing based upon the Rasch model. That model may provide for interval 
scaling of both test scores and Individual test items on the underlying trait 
being measured. Work is currently in progress toward the building up of a pool 
of items calibrated by the model through the cooperation of a number of districts 
in the Northwest Evaluation Association and toward a simultaheous verification 
of the validity of the model. The existence of such a pool of calibrated items 
related to the comprehensive set of learning outcomes developed by the Tri-county 
Goal Development Project would allow accurate reporting of student progress 
toward goals set at the classroom and individx^al student level, thus meeting 



the instructional purposes of measurement.^ 



It wo 1 simultaneously permit 
rmance at the building, area and 



comparable leports of aggregate student perfo 
district levels, thus satisfying the administratifps and management uses of 
testing. Moreover, although the Rasch approach does not provide norms itself, 
the capability to equate test results through this technique makes it possible 
to ta^^advantage of available norming information when and if such information 
sh^id^'also be required for further administrative and management purposes. 



2. Doherfy, Victor W. and Walter E. Hathaway, Designing behavioral goals, 
K-12. Oregon Association for Supervisi on antl Curriculum Development , 
CurricuTmi) Bulletin Volume 27, 'No. 320, December, 1973. 
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Conclusion 

Tliere are very few cases where the numerous assumptions which must be 
ihet in order for Grade Level Equivalents to be free of serious distortion are 
in fact satisfied. In-view of this it seems best to avoid the use of these 
conversions entirely. With a little care existing derived scales which are 
relatively free from at least some of the dangers inherent in Grade Level 
Equivalents can be rendered similarly '^understandable" to users. Current 
explorations of such promising approaches as the Rasch model may lay the 
groundwork for valid comparisons among locally autonomous programs while at 
the same time providing needed information the progress of individuals and 
groups of students toward attaining the spelific learning outcomes sought 
within those programs. • ^ 
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